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The key enzyme in coronavirus polyprotein processing 
is the viral main proteinase, MP, a protein with 
extremely low sequence similarity to other viral and 
cellular proteinases. Here, the crystal structure of the 
33.1 kDa transmissible gastroenteritis (corona)virus 
MP’ is reported. The structure was refined to 1.96 A 
resolution and revealed three dimers in the asym- 
metric unit. The mutual arrangement of the proto- 
mers in each of the dimers suggests that Mpre 
self-processing occurs in trans. The active site, com- 
prised of Cys144 and His41, is part of a chymotrypsin- 
like fold that is connected by a 16 residue loop to an 
extra domain featuring a novel a-helical fold. 
Molecular modelling and mutagenesis data implicate 
the loop in substrate binding and elucidate S1 and S2 
subsites suitable to accommodate the side chains of 
the P1 glutamine and P2 leucine residues of MP'° sub- 
strates. Interactions involving the N-terminus and the 
a-helical domain stabilize the loop in the orientation 
required for trans-cleavage activity. The study illus- 
trates that RNA viruses have evolved unprecedented 
variations of the classical chymotrypsin fold. 
Keywords: 3C-like/catalytic dyad/coronavirus/proteinase/ 
X-ray crystallography 


Introduction 


Transmissible gastroenteritis virus (TGEV) belongs to the 
Coronaviridae, a family of positive-strand RNA viruses. 
Coronaviruses have the largest RNA viral genomes known 
to date (28 500 nucleotides in the case of TGEV) and share 
a similar genome organization and common transcrip- 
tional and translational strategies with the Arteriviridae 
(den Boon et al., 1991; Cavanagh, 1997). TGEV infection 
is associated with severe and often fatal diarrhoea in young 
pigs (for reviews see Enjuanes and van der Zeijst, 1995; 
Saif and Wesley, 1999). 

The viral proteins required for TGEV genome replic- 
ation and transcription are encoded by the replicase gene 
(Eleouet et al., 1995; Penzes et al., 2001). This gene 
encodes two replicative polyproteins, ppla (447 kDa) and 
pplab (754 kDa) that are processed by virus-encoded 
proteinases to produce the functional subunits of the 
replication complex (reviewed in Ziebuhr ef al., 2000). 
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The central and C-proximal regions of ppla and pplab are 
processed by a 33.1 kDa viral cysteine proteinase which is 
called the ‘main proteinase’ (MP"°) or, alternatively, the 
“*3C-like proteinase’ (3CL°). The name ‘3C-like protein- 
ase’ was introduced originally because of similar substrate 
specificities of the coronavirus MP'° and picornavirus 3C 
proteinases (3CP'°) and the identification of cysteine as the 
principal catalytic residue in the context of a predicted 
two-f-barrel fold (Gorbalenya et al., 1989a,b). Meanwhile 
however, several studies have revealed significant differ- 
ences in both the active sites and domain structures 
between the coronavirus and picornavirus enzymes (Liu 
and Brown, 1995; Lu and Denison, 1997; Ziebuhr ef al., 
1997, 2000; Hegyi et al., 2002). Also, the crystal structures 
reported for a number of picornavirus 3C proteinases 
(Allaire et al., 1994; Matthews er al., 1994; Bergmann 
et al., 1997; Mosimann et al., 1997) have not been useful 
in predicting the three-dimensional structures of corona- 
virus main proteinases. Because of the large phylogenetic 
distance between the two groups of enzymes, we will use 
the term coronavirus MP" throughout this article. 

Sequence comparisons (Figure 1) and experimental data 
obtained for other coronavirus homologues allow us to 
predict that the mature form of the TGEV MP’ is released 
from ppla and pplab by autoproteolytic cleavage at 
flanking GlnJ(Ser,Ala) sites (Eleouet et al., 1995; Hegyi 
and Ziebuhr, 2002). Accordingly, the TGEV MP’? has 302 
amino acid residues that correspond to the ppla/pplab 
residues 2879-3180. In vivo and in vitro analyses of avian 
infectious bronchitis virus (IBV), mouse hepatitis virus 
(MHV) and human coronavirus 229E (HCoV 229E) MPre 
activities have shown consistently that the proteinase 
cleaves the replicase polyproteins at 11 conserved sites 
and, therefore, it seems reasonable to conclude that the 
M?'°-mediated processing pathways are conserved in all 
coronaviruses, including TGEV. 

Previous theoretical studies and experimental data have 
led to the following conclusions (Bazan and Fletterick, 
1988; Gorbalenya ef al., 1989a,b; Liu and Brown, 1995; 
Lu et al., 1995; Ziebuhr et al., 1995, 1997, 2000; Lu and 
Denison, 1997; Seybert et al., 1997; Ziebuhr and Siddell, 
1999; Ng and Liu, 2000; Hegyi et al., 2002): (4) Corona- 
virus main proteinases employ conserved cysteine and 
histidine residues in the catalytic site. In TGEV MP", these 
are Cys144 and His41. There has been some debate on the 
existence of a third residue in the catalytic centre. In 
common with picornavirus 3C proteinases, the catalytic 
centre of the coronavirus MP"? is predicted to be embedded 
in a chymotrypsin-like, two-B-barrel structure in which 
cysteine (rather than serine) serves as the principal 
nucleophile. (ii) Coronavirus main proteinases have 
well-defined substrate specificities. All known cleavage 
sites contain bulky hydrophobic residues (mainly leucine) 
at the P2 position, glutamine at the P1 position, and small 
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Fig. 1. Sequence comparison of coronavirus main proteinases. The alignment was produced using CLUSTAL X, version 1.81 (Thompson ef al., 1997), 
and corrected manually on the basis of the three-dimensional structure of TGEV MP". The corresponding sequences of FIPV (strain 79-1146), HCoV 
(strain 229E), bovine coronavirus (BCoV, isolate LUN), MHV (strain JHM) and IBV (strain Beaudette) were derived from the replicative polyproteins 
of the respective viruses whose sequences are deposited at the DDBJ/EMBL/GenBank database (accession Nos: FIPV, AF326575; HCoV, X69721; 
BCoV, AF391542; MHV, M55148; IBV, M95169; TGEV, AJ271965). The B-strands and a-helices as revealed in the TGEV MP" crystal structure 
(this study) are shown above the sequence alignment (see also Figures 4 and 5). Black background colour indicates the catalytic cysteine and histidine 
residues. Grey background colour indicates the key residue of the S1 subsite (TGEV MP His162) and its equivalents in other coronavirus main 
proteinases. Also shown in grey are the phenylalanine and tyrosine residues (TGEV MP" Phe139 and Tyr160) that are proposed to stabilize the neutral 


state of His162 (see text for details). 


aliphatic residues at the Pl’ position. (iii) Coronavirus 
main proteinases possess a large C-terminal domain of 
~110 amino acid residues that is not found in other RNA 
virus 3C-like proteinases. The characterization of recom- 
binant proteins, in which 33, 28 and 34 C-terminal amino 
acid residues were deleted from the IBV, MHV and HCoV 
main proteinases, respectively, resulted consistently in 
dramatic losses of proteolytic activity, suggesting that the 
C-terminal domain of MP'° contributes to proteolytic 
activity through undefined mechanisms. 

The 1.96 A TGEV MP crystal structure reported herein 
reveals the structural details of a unique catalytic system 
and facilitates the interpretation of previously published 
mutagenesis studies that have, at least in part, remained 
speculative due to the complete lack of structural inform- 
ation on ‘3C-like’ enzymes. 


Results and discussion 


Structure determination by MAD phasing 
The presence of 10 methionine residues in the TGEV MP? 
molecule suggested that selenomethionine-based multi- 
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wavelength anomalous dispersion (MAD; Hendrickson 
et al., 1990) could be used to solve the phase problem. The 
unit cell dimensions of the crystals (a=72.8 A, b= 160.1 A, 
c= 88.9 A, B = 94.3°, space group P2,) and self-rotation 
calculations indicated the presence of as many as six 
TGEV MP'° molecules per asymmetric unit. In the MAD 
phasing process, we finally succeeded in locating 48 (out 
of 60) crystallographically independent selenium sites by 
the ‘Shake & Bake’ approach to direct methods (Weeks 
and Miller, 1999), without recourse to heavy atom 
derivatives or other methods of phasing (see Materials 
and methods). The phases obtained resulted in a readily 
interpretable electron density map. 


Quality of the model 

All six copies (designated A-F) of the TGEV MP’? in the 
asymmetric unit of the crystal could be built into well- 
defined electron density (Figure 2), which covered almost 
all of the 302 amino acid residues of each monomer. The 
only exceptions were the two C-terminal residues which 
were not visible in five of the six chains. Monomers A, E 
and F also lacked electron density for residue 300. 


His16 


Structure of TGEV main proteinase 


Fig. 2. Stereo view of a representative part of the electron density map. The 2IF,l — IF,| electron density map (1.96 A resolution, contoured at lo 
above the mean) corresponds to MP’ residues 160-162 (Tyr—Met-His), a conserved motif in coronavirus main proteinases. The strong hydrogen 
bonding interaction between the Tyr160 hydroxyl group and His162 N®! is indicated. 


The final model comprises 1798 amino acid residues 
and 1006 water molecules, as well as 27 sulfate ions, 
nine dioxane molecules and six 2-methyl-2,4-pentanediol 
(MPD) molecules from the crystallization medium. The 
refinement converged to a final R-factor of 0.210 and an 
Reree (Bringer, 1992) of 0.256, with good stereochemistry. 
Altogether, 88.4% of the amino acid residues were found 
in the most favoured regions of the Ramachandran plot, 
and 10.8% were in additionally allowed regions. Residues 
Asn70, Asn71 and Ser279 were in regions only generously 
allowed, but had clear electron density. 


Domain structure 

The six TGEV MP'® monomers present in the asymmetric 
unit are arranged in three dimers (Figure 3). Each 
monomer is folded into three domains, the first two of 
which are antiparallel B-barrels reminiscent of those found 
in serine proteinases of the chymotrypsin family (Figure 4). 
Residues 8-100 form domain I, and residues 101-183 
make up domain II. The connection to the C-terminal 
domain III is formed by a long loop comprising residues 
184-199. Domain III (residues 200-302) contains a novel 
arrangement of five a-helices. A deep cleft between 
domains I and II, lined by hydrophobic residues, consti- 
tutes the substrate-binding site. The catalytic site is 
situated at the centre of the cleft. 

The interior of the B-barrel of domain I consists entirely 
of hydrophobic residues. A short o-helix (helix A; 
Tyr53—Ser58) closes the barrel like a lid. Domain II is 
smaller than domain I and also smaller than the 
homologous domain II of chymotrypsin and hepatitis A 
virus (HAV) 3CP° (Tsukada and Blow, 1985; Allaire et al., 
1994; Bergmann et al., 1997). Several secondary structure 
elements of HAV 3C? (strands b2II and cII and the 
intervening loop) are missing in the TGEV MP"°. Also, the 
domain II barrel of the TGEV MP’? is far from perfect 
(Figure 4). The segment from Gly135 to Ser146 forms a 
part of the barrel, even though it consists mostly of 


consecutive loops and turns. In fact, in contrast to domain 
I, a structural alignment of domain II has proven difficult. 
The superposition of domains I and II of the TGEV MPre 
onto those of the HAV 3C? yields an r.m.s.d. of 
1.85 + 0.05 A for 114 equivalent (out of 184 compared) 
Cy, pairs, while domain II alone displays an r.m.s.d. of 
3.25 + 0.28 A for 57 (out of 85) Cy pairs. 

Domain III is composed of five, mostly antiparallel, 
a-helices and the loops connecting them. The crossover 
angles are ~90° between helices B and E, ~30° between B 
and D, ~20° between C and E, and ~80° between E and F, 
whereas C-—B and B-F are parallel to each other (see 
Figure 5). Interhelical contacts are mediated by hydro- 
phobic side chains. The loops between the helices are quite 
long and fill up most of the interstitial space of domain III. 
Database searches (Holm and Sander, 1993; Gilbert et al., 
1999) did not reveal other proteins or protein domains with 
the same topology as domain III. The N-terminal segment 
(residues 1—5) of the polypeptide chain folds onto domain 
II, placing the N-terminus of the protein within 17.0 
(+2.7) A of the C-terminus (Figure 4). 

The six copies of the TGEV MP" in the asymmetric unit 
of the crystal are highly similar. The core regions of 
domains I and II display an r.m.s.d. of 0.29 (£0.09) A for 
130 equivalent C, atoms (monomer A as a reference; 
herein, geometrical values given are the r.m.s. over the six 
monomers, with the corresponding standard deviation). If 
all 299 well-determined C, positions are included, the 
average r.m.s.d. for all monomers is 0.57 (+0.18) A. The 
largest deviations of the main chain trace are in: (1) the 
N-terminal segment from residues | to 4 (average r.m.s.d. 
1.69 + 0.91 A); (ii) the flexible surface loop from residues 
216 to 225 (average r.m.s.d. 0.99 + 0.51 A); (iil) the 
C-terminus of helix E and the loop region between 
residues 267 and 276 (average r.m.s.d. 0.99 + 0.42 A); and 
(iv) the segment 294-300 following the C-terminal F helix 
(average r.m.s.d. 1.55 + 0.44 A). In addition to being 
flexible and at the surface of the molecules, segments 
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Fig. 3. Stereo depiction of the six molecules (three dimers) of TGEV M?"° in the asymmetric unit. The monomers A-F are shown in different colours; 
A = red, B = black, C = green, D = orange-red, E = yellow and F = cyan. Note the 2-fold symmetry axes between the monomers in each of the dimers, 
and between the two lower dimers in the figure (AB and EF). Each of the monomers measures ~70 AX22AX40A. 


(ii) and (iii) are involved in interdimer crystal contacts in 
some but not all of the six protomers. Surprisingly, the 
regions with the highest r.m.s.d. are not the regions with 
the highest temperature factors, except for the C-terminal 
domain of monomer F which does have high temperature 
factors (~70 A2; whole model 47 A2, including all 1006 
water molecules). 


Active site 

The active site of the coronavirus MP" is similar to those of 
the picornavirus 3C proteinases, as had been predicted 
earlier (Gorbalenya et al., 1989b). The mutual arrangement 
of the nucleophilic Cys144 and the general acid—base 
catalyst His41 of TGEV MP" is identical to that of the 
HAV 3C?° Cys172 and His44 residues and the Ser195 and 
His57 residues of chymotrypsin. The distance between the 
sulfur atom of Cys144 and the N® of His41 is 4.05 
(40.04) A, i.e. longer than the corresponding cysteine— 
histidine distances in HAV 3C? (3.92 A; Bergmann 
etal., 1997), poliovirus (PV) 3CP? (3.4 A; Mosimann et al., 
1997) and papain (3.65 A; Kamphuis et al., 1984) 
(Figure 6B and C). In contrast to papain, but in agreement 
with the picornavirus 3C proteinases, the sulfur atom 
is in the plane of the histidine imidazole. There are 
clear indications from the difference Fourier synthesis 
(Figure 6A) that Cys144 is oxidized, at least to the stage of 
the sulfinic acid, -SO 2, and probably to the sulfonic acid, 
-SO3,, in all six copies of TGEV MP’ in the crystal. Such 
oxidation could occur during the time required for 
crystallization or during X-ray data collection, and would 
lead to inactivation of the enzyme. Refinement of the 
corresponding derivatives was, however, not successful. 
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It is generally assumed that the native state of the active 
site of papain-like cysteine proteinases is a thiolate— 
imidazolium ion pair formed by cysteine and histidine 
residues (Polgar, 1974). In proteinases of the papain 
family, an asparagine is the third member of the catalytic 
triad. Chymotrypsin and other members of this serine 
proteinase family have a catalytic triad consisting of 
Ser195...His57...Asp102. In HAV 3CP, Asp84 is present 
at the required position, although its side chain points 
away from His44, making its role disputable (Malcolm, 
1995; Bergmann et al., 1997). PV 3CP, human rhinovirus 
(HRV) 3CP° and HRV 2A?"° have a glutamate or aspartate 
in the proper orientation to accept a hydrogen bond from 
the active site histidine (Matthews et al., 1994; Mosimann 
et al., 1997; Petersen et al., 1999). In contrast, TGEV Mer 
has Val84 in the corresponding position, with its side chain 
pointing away from the catalytic site (Figure 6B and C). A 
buried water molecule is found in the place that normally 
would be occupied by the side chain of the third member 
of the catalytic triad. This water molecule makes hydrogen 
bonds to His41 N®!, Hisl63 N®! and Asp186 0%! 
(Figure 6B). His163 is not conserved among coronavirus 
main proteinases and its substitution by leucine (MPr°- 
H163L) had no significant effect on the proteolytic activity 
in the standard peptide assay (see Materials and methods), 
as compared with the activity of the wild-type MPr° 
(Table I). Asp186 makes a salt bridge to Arg40 that 
appears to be required to maintain the active site geometry, 
since both Asp186 and Arg40 are absolutely conserved 
among coronaviruses. Through this (and other) inter- 
action(s), the polypeptide segment 184-199, which con- 
nects domains II and II and is probably involved in 
substrate binding (see below), is held in the proper 


Fig. 4. A MOLSCRIPT diagram (Kraulis, 1991) showing the overall 
fold of TGEV MP" (A) with the two B-barrel domains and the a-helical 
C-terminal domain. B-strands and helices are represented as arrows and 
cylinders, respectively. The B-barrels of each domain I and II are com- 
posed of six-stranded B-sheets (green). Domain III is composed mainly 
of a-helices (red). The structures of HAV 3C?'° (PDB code: 1HAV) (B) 
and a-chymotrypsin (4CHA, residues 12-15 and 147-148 are 
excised) (C) are shown for comparison. 


position. Taken together, the data contradict a direct 
involvement of His163 or Asp186 in catalysis, making the 
TGEV MP? a clear case of a viral cysteine proteinase 
employing only a catalytic dyad. 

Substrate hydrolysis by cysteine and serine proteinases 
occurs through a covalent tetrahedral intermediate result- 
ing from attack of the active site nucleophile on the 
carbonyl carbon of the scissile bond. The developing 
oxyanion is stabilized by strong hydrogen bonds donated 
by amide groups of the enzyme. This so-called ‘oxyanion 
hole’ is also found in TGEV MP". It is made up by the 
main chain amides of Glyl42, Thrl43 and Cysl44 
(Figure 6B). 


Structure of TGEV main proteinase 


Fig. 5. Topological representation of the secondary structure elements 
of a TGEV MP monomer. o-helices and B-strands are represented as 
cylinders and arrows, respectively. Numbers indicate the N- and 
C-terminal residues of the secondary structure elements. Strands bI and 
cl are adjacent. Cys144 (yellow) and His41 (blue) are shown by circles. 
The positions of the N- and C-termini are indicated. Also, the presumed 
localization of the P5—P1 region of a model substrate is shown (blue) 
(for details, see text and Figure 7). 


Substrate-binding site 

The specificity of MP'° for a very limited range of amino 
acids at the P1, P2 and P4 positions resembles the substrate 
specificity of picornavirus 3C proteinases (Palmenberg, 
1990; Ziebuhr et al., 2000). This leads us to believe that, 
similarly to 3CPr° (Matthews ef al., 1994; Bergmann et al., 
1997; Mosimann et al., 1997), specific substrate binding 
by MPt° is ensured by well-defined S4, S2 and S1 
specificity pockets. In order to visualize potential inter- 
actions with the substrate, we have modelled a pentapep- 
tide representing the P5—P1 residues of a TGEV MPr° 
cleavage site (Asn—Ser-Thr—Leu-—Gln, ppla amino acids 
2874-2878; Hegyi and Ziebuhr, 2002) into the substrate- 
binding cleft of MP'° (Figure 7). The model is based on 
the assumption that MP'° binds substrates in a manner 
analogous to that found in complexes of chymotrypsin-like 
proteinases with peptide inhibitors. X-ray structures have 
shown that the P4—P1 residues of peptide inhibitors 
assume a common main chain conformation when bound 
to these proteinases, with the P4 and P3 residues adopting 
a B conformation and the P2 and P1 residues assuming a 
specific main-chain conformation suitable to place their 
side chains in the pre-formed S1 and S82 specificity pockets 
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Fig. 6. Active site of the TGEV MP". (A) Difference electron density (IF,l — IF | at 3.00 above the mean; red) for the oxidized active site Cys144, indi- 
cating three oxygen atoms bound to the sulfur. (B) The catalytic Cys144 and His41 residues are shown. The region forming the oxyanion hole (main 
chain amides of Gly142, Thr143 and Cys144) is highlighted in pink. The water molecule, which occupies a position equivalent to that of the catalytic 
aspartate of serine proteinases, is shown together with its hydrogen-bonding partners, His41, His163 and Asp186. (C) Superposition of the active site 
residues of chymotrypsin (shown in red) with the spatially equivalent residues of TGEV MP’? (blue) and HAV 3C?'° (green). The equivalent to the 
third catalytic residue (Asp102) of chymotrypsin is Asp84 in HAV 3C?" (side chain oriented differently) and Val84 in TGEV MP’. 


Table I. Enzymatic activities of TGEV MP’? mutants 


Plasmid Oligonucleotides used for cloning or mutagenesis (5’3’) Protein MP’ amino Activity 
acids (%)* 

pMal-Mpre TCAGGTTTGCGGAAAATGGCAC, Mpre Serl—Gln302 100 
AAAAGGATCCTTACTGAAGATTTACACCATACATTTG 

pMal-MP"°A184—302 TCAGGTTTGCGGAAAATGGCAC, MP°A184—302 Serl-Gly183 <0.02 
AAAGGATCCTTAACCACCGTACATTTCTCCTTCAAAATT 

pMal-MP'°A200-302 TCAGGTTTGCGGAAAATGGCAC, MP"°A200-302 Serl—Ser199 0.4 
AAAGGATCCTTATGACATGACATTAGTACCTTCCAATTG 

pMal-MP'°A1—5/A200-302 ATGGCACAGCCTAGTGGTCTTGTA, MP°A1—5/A200-302 Met6—Ser199 0.6 
AAAGGATCCTTATGACATGACATTAGTACCTTCCAATTG 

pMal-MP"°A1—-5 ATGGCACAGCCTAGTGGTCTTGTA, MPPAl-5 Met6—G1n302 0.3 
AAAAGGATCCTTACTGAAGATTTACACCATACATTTG 

pMal-MP"?-H163L GTATACATGCATCTCTTAGAACTTGGAAATGGCTCGCAT, = MP"°-H163L Serl—Gln302 98 
TCCAAGTTCTAAGAGATGCATGTATACAAAATAGAGAAT (His 163—Leu) 

pMal-MP"-C144A AGCTGGTACTGCTGGATCAGTAGGTTATGTGTTAGAA, MP-C144A Serl—Gln302 <0.02 
CTACTGATCCAGCAGTACCAGCTATAAAAGATCCTTT (Cys144—Ala) 


The sequence of the 15mer substrate peptide, H,N-VSVNSTLQSGLRKMA-COOH, was derived from the N-terminal MP" autoprocessing site 
(residues shown in bold indicate the scissile bond). The activity of wild-type MP" (encompassing 302 residues) was taken as 100% and the mean 


value of three experiments, which did not vary by more than 15%, is shown. 


Proteolytic activities were determined using a peptide-based cleavage assay (Ziebuhr et al., 1997; see Materials and methods). 


(James et al., 1980; Fujinaga et al., 1985, 1987, Matthews 
et al., 1999). These studies lead us to suggest that the 
residues P5 to P3 of MP substrates may form an 
antiparallel B-sheet with segment 164-167 of the long 
strand ell on one side, and with the segment 186-191 
(which links domains II and III) on the other. Hydrogen 
bonding interactions are likely between the main chain 
amide and carbonyl oxygen atoms of substrate residues 
Thr(P3), Ser(P4) and Asn(P5) and the main chain atoms of 
TGEV MP residues Glul65, Serl89 and Gly167 (see 
Figure 7). 
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S17 subsite 

It has been shown for the HAV, HRV and PV 3CPr° 
enzymes that the imidazole side chain of a conserved 
histidine, which is located in the centre of a hydrophobic 
pocket, interacts with the P1 carboxamide side chain of the 
substrate. This interaction is generally accepted to deter- 
mine the picornavirus 3CP"° specificity for glutamine at P1 
(Matthews et al., 1994, 1999; Bergmann ef al., 1997; 
Mosimann ef al., 1997). Mutational analyses revealed that 
any replacement of Hisl162 completely abolished the 
proteolytic activities of the HCoV and feline infectious 


Structure of TGEV main proteinase 


Fig. 7. Stereo diagram of a P5—P1 substrate (Asn—Ser-Thr—Leu-Gln, red; corresponding to the TGEV MP'® N-terminal autoprocessing site) modelled 
into the active site cleft of the TGEV M?'°. Hydrogen bonds are depicted by dotted lines. 


peritonitis virus (FIPV) MP? enzymes (Ziebuhr et al., 
1997; Hegyi et al., 2002). The structure shows that the 
imidazole side chain of His162 is positioned suitably to 
interact with a Pl glutamine side chain. His162 is located 
at the very bottom of a hydrophobic pocket which is 
formed by residues Phe139 and the main-chain atoms of 
Tle140, Leul64, Glul65 and His171. The side chain of 
Glul65 forms an ion pair (2.96 + 0.14 A) with His171. 
This salt bridge is itself on the periphery of the molecule, 
forming part of the ‘outer wall’ of the S1 subsite. 
Accordingly, mutants of the HCoV 229E MP’, in which 
the residue equivalent to His!71 had been replaced by 
alanine, serine or threonine, retained significant proteo- 
lytic activities (Ziebuhr et al., 1997). In order to interact 
with the P1 glutamine side chain of the substrate, His162 
has to maintain a neutral state over a wide pH range. Most 
probably, this is achieved by two important interactions: 
(i) stacking onto the pheny] ring of Phe139, at a distance of 
3.53 + 0.18 A; and (ii) accepting a hydrogen bond from 
the buried Tyr160 hydroxyl group which has no other 
hydrogen-bonding partner. The role proposed for the 
hydroxyl group of Tyr160 is strongly supported by FIPV 
MPr’e mutagenesis studies in which the proteolytic activi- 
ties of Y160F, Y160G, Y160A and Y160T mutants were 
shown to be dramatically reduced (Hegyi et al., 2002). 
Tyr160 is part of the absolutely conserved coronavirus 
MP" sequence signature, !°Tyr-X-His!® (Figures 1 and 
2), whereas Gly(Ala)-X-His is found at the equivalent 
sequence position in most 3C and 3C-like proteinases 
(Gorbalenya et al., 1989a). Accordingly, in the 3C and 3C- 
like proteinases, stabilization of histidine in the neutral 
tautomeric state has to be ensured by other residues. 
Notably, in the case of PV 3C?P™, this involves a tyrosine 
residue (Tyr138) which, however, is provided by a 
different part of the structure (6-strand cII; Mosimann 
et al., 1997). For HAV 3C?'°, other mechanisms are 
proposed (Bergmann ef al., 1997). 


Halfway down the S1 subsite of TGEV MP", there is 
dumbbell-shaped electron density which we have assigned 
to two water molecules, although theoretically they are too 
close to one another (2.10 + 0.16 A). One of them makes a 
hydrogen bond with N®? of His162, while the second one, 
unusually for water, makes no additional contacts. In our 
model of the substrate complex, these two water molecules 
mark the position of the carboxamide group of the Pl 
glutamine side chain. 


S2 subsite 

Coronavirus main proteinases have a strong preference for 
leucine at the P2 position (Ziebuhr et al., 2000). The 
putative S2 subsite identified in the structure is a 
hydrophobic pocket that is suitably positioned and large 
enough to accommodate a leucine side chain easily. The 
S2 pocket is lined by the side chains of Leu164 (the main 
chain of which forms part of the S1 subsite, see above), 
Pro188, [le51, His41 and Thr47 (Figure 7). In our electron 
density maps, part of the S2 subsite (of all six copies of the 
monomer) harbours extra electron density that we inter- 
preted as an MPD molecule from the crystallization 
medium. In the HAV 3C?"°, the corresponding subsite is 
formed by different parts of the polypeptide chain. It is 
also smaller and can accommodate the side chains of 
serine and threonine (Bergmann et al., 1997). 


Quaternary structure 

The quaternary arrangement of the proteinase is a 
homodimer, with three copies in the asymmetric unit 
(monomers A and B, C and D, and E and F). All dimers 
have approximate C, symmetry (Figure 3) and ~1580 
(+199) A? of each monomer, i.e. 11-12% of its solvent- 
accessible surface, are buried upon dimerization. The 
dimer formation is driven mainly by intermolecular 
interactions between domains II and III of one monomer 
and the N-terminal residues of the other (see below for 
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Fig. 8. Intra- and intermolecular contacts of the TGEV MP'® N-terminus. (A) MOLSCRIPT stereo representation of a TGEV MP’ dimer. Molecule A 
is coloured from blue at the N-terminus, via green (domain II), to red (C-terminus), while molecule B is shown in grey. The catalytic Cys144 and 
His41 residues are labelled in both monomers. (B) Detailed view of the interactions made by the N-terminal segment (blue) and domains II/III of 
monomer A as well as domains I/III of monomer B. Residues critically involved in these interactions are designated by the single-letter code and 
shown in ball-and-stick representation (see text for details). The N- and C-termini of molecule A are indicated. 


further details). In contrast, the domain III-domain III 
interface appears to be the consequence rather than the 
cause of other intermolecular interactions. It involves a 
relatively small area of 337 + 45 A? and comprises only 
two hydrogen bonds, between the amide group of 
Gly281 (molecule A) and the main-chain oxygen of 
Ser279 (molecule B), as well as its symmetry mate, 
Gly281B...Ser279A (3.22 + 0.37 A, averaged over all six 
monomers). 

Interestingly, the N-terminal residues of each monomer 
are relatively close to the substrate-binding site of the 
other monomer in the dimer. The following observations 
for monomer A hold true for all other monomers. The 
NH;* group of SerlA, which is the P1’ residue of the 
autocleavage reaction of TGEV MP", is 11.9 + 1.6 A from 
the active site Cys144B SY of the second molecule in the 
dimer but as much as 34.2 + 0.9 A away from its own 
active site cysteine. SerlA is in contact with residues 
participating in the substrate-binding site of monomer B. 
Its NH3* group makes a salt bridge (4.99 + 1.04 A) to the 
carboxylate of Glul65B (Figure 8). This glutamate, which 
is absolutely conserved among coronaviruses, is part of the 
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S1 subsite (see above), where it also interacts with His171. 
Although these two side chains form the ‘wall’ of the 
specificity site, they have their polar groups oriented 
towards the surface of the proteinase molecule and away 
from the substrate’s P1 glutamine. An intermolecular ionic 
interaction between Arg4A and Glu286B (6.0 + 0.7 A) 
appears to play a role in positioning the N-terminal 
residues. Because of the 2-fold non-crystallographic 
symmetry (NCS), the same interaction occurs between 
Arg4B and Glu286A. Residues 6A—8A form a short 
B-strand interacting with strand cII of monomer B (at 
Vall24B). Most of the interactions between the 
N-terminus of molecule A and the region next to the S1 
subsite of molecule B constitute a perfect fit. Given the 
fact that the P’ residues in serine and cysteine proteinases 
constitute the leaving group of the cleavage reaction and, 
in coronavirus main proteinases, are not subject to 
stringent specificity requirements, it is quite conceivable 
that, after autoproteolysis, the N-terminus of one monomer 
slides over the active site of the partner monomer and 
adopts the position seen in our crystal structure, i.e. with 
SerlA interacting with Glul65B at the ‘outer wall’ of the 


S1 subsite. This, in turn, would suggest that the dimer we 
are seeing corresponds to the product of the autolysis 
reaction and that this occurs in trans. Molecular modelling 
revealed that binding of the MP'° N-terminus in the active 
site cleft of the same molecule would require remodelling 
of the entire N-terminal segment and beyond (residues 
1-13; data not shown), making cleavage in cis less likely. 
There is additional experimental evidence supporting 
these conclusions. First, dilution experiments with MHV 
MP*° translated in vitro contradict cis-cleavage activity (Lu 
et al., 1996). Secondly, the fact that, early in infection, 
MPF*° remains part of a relatively stable 150 kDa precursor 
protein in which it is flanked by hydrophobic domains 
(Schiller et al., 1998) argues against rapid autoprocessing 
in cis. The proposed model of intermolecular self- 
processing would imply that components of the replication 
complex could first be anchored to membranes (i.e. the site 
of RNA replication) in an uncleaved form, and only later, 
when the precursor proteins accumulate to high local 
concentrations, will MP" release itself by intermolecular 
cleavage, thereby triggering the complete spectrum of 
trans-processing reactions. 


Intramolecular interactions of the N-terminus 

A specific conformation of the N-terminal segment allows 
it to ‘squeeze’ residues 1-8 in between domains II and II 
of the same monomer and domains II and II of monomer 
B (see above and Figure 8). In this context, the N-terminus 
also interacts with domains II and II of its own protomer. 
For example, the side-chain amino group of LysSA makes 
strong intramolecular hydrogen bonds with Ser110A OY of 
domain II (2.83 + 0.15 A), and with the Glu286A main 
chain oxygen (2.80 + 0.07 A), as well as with Glu291A 
O#! (2.74 + 0.13 A) of domain III. Furthermore, the side 
chain of Leu3A completes a hydrophobic patch on domain 
Il which includes Phe206A, Ala209A, Phe287A, 
Val292A, the Cg atom of Gln295A and Met296A; 
these residues belong to helices B and F. All sequenced 
members of the coronavirus proteinase family have a 
hydrophobic residue in position 3, while glycine is 
absolutely conserved in position 2 (see Figure 1). The 
latter residue adopts the a, conformation which is easily 
accessible only to glycine. To investigate the functional 
significance of these interactions, a recombinant protein, 
MP°Al—5, in which the N-terminal residues Serl—Lys5 
were removed from the MP'° sequence, was expressed and 
tested for proteolytic activity in a trans-cleavage assay 
using a |5mer peptide representing the N-terminal TGEV 
MP’? autoprocessing site. As shown in Table I, the activity 
of MPreA1—5 was decreased to only 0.3% of the MP 
activity. We conclude from these data that, indeed, 
residues 1-5 may be critically involved in stabilizing the 
mutual orientation of domains II and III and thus, 
indirectly, in maintaining the proper orientation of the 
intervening loop region (residues 184-199). If this 
hypothesis is correct, then the deletion of domain III 
should have similarly detrimental effects on the proteo- 
lytic activity and, in fact, the published data (see 
Introduction) seem to support this conclusion. To cor- 
roborate this hypothesis further, an additional set of MPr° 
mutants was characterized in which we used the structural 
information to remove domain HI completely. In this 
approach, the probability of domain III misfolding, which 
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might have been the cause of MP inactivation in previous 
studies using randomly ‘truncated’ coronavirus main 
proteinases (Lu and Denison, 1997; Ziebuhr et al., 1997; 
Ng and Liu, 2000), should be significantly reduced. The 
TGEV MP"? deletion mutants tested for activity comprised 
(i) domains I and II (MP°A184—302); (i) domains I and II 
together with the entire loop region (MP"°A200-302); or 
(iii) domains I and II combined with the loop region but 
lacking the five N-terminal residues (MP°A1—5/A200- 
302). As Table I shows, MP'°A200-302 had clearly 
detectable (albeit significantly reduced) activity (0.4% of 
MP°). Similarly, the mutant MPr°A1—5/A200-302 had 
significantly reduced activity (0.6% of MP’). In sharp 
contrast, no activities were detectable for MP'°A184—302 
and the active site mutant, MP'°-C144A (the latter being 
used as a negative control). The fact that residues 184-199 
proved to be indispensable for proteolytic activity supports 
our model of substrate binding (Figure 7) in which 
residues of the loop are predicted to be critically involved 
in the formation of a B-sheet-type structure with the 
substrate (see above). The data also show that an intact 
N-terminus and the C-terminal domain are required for full 
activity. The structure suggests that the additional a- 
helical domain III as well as the N-terminal residues help 
fix domains II and the loop 184-199 in a catalytically 
competent orientation. It will be interesting to investigate 
whether similar mechanisms are also operating in other 
3C-like proteinases with (smaller) C-terminal domains 
(e.g. arteriviruses and potyviruses; Ziebuhr et al., 2000; 
Hegyi et al., 2002). 

Beyond its presumed role in proteolytic activity, domain 
II may have other functions, which remain to be 
determined. In contrast to picornavirus 3C proteinases 
for which RNA-binding activities are well established 
(Andino et al., 1993; Leong et al., 1993; Xiang et al., 
1995), the MP? structure does not support such an activity 
for the coronavirus main proteinase. Thus, calculation of 
the electrostatic potential (Nicholls et al., 1991) does not 
reveal an overall basic character of domain II, nor are 
there distinct patches of basic or aromatic residues (data 
not shown). The same applies to domains I and II. Also, 
the conserved picornavirus sequence motif, KFRDI, 
located between domains I and II, as well as the small 
helices and reverse turns that together form the RNA- 
binding site of HAV 3C?° (Bergmann ef al., 1997) are 
missing in the TGEV MP" structure. 


Conclusion 

The crystal structure of TGEV MP’ shows that corona- 
viruses have evolved proteinases in which a thiolate— 
imidazolium catalytic dyad has been combined with a 
two-B-barrel fold. This framework is extended further by a 
novel o-helical domain that, together with the N-terminal 
residues 1—5, appears to be involved in proteolytic activity 
by maintaining the proper positioning of the presumed 
substrate-binding loop, 184—199. We are confident that the 
first crystal structure of a non-picornaviral chymotrypsin- 
like cysteine proteinase will facilitate further molecular 
modelling of other members of the huge family of RNA 
viral ‘3C-like’ enzymes for which structural information is 
still lacking. 


3221 


K.Anand et al. 


Table Il. Summary of X-ray diffraction data from crystals of native and SeMet-substituted MP'° 


Peak Edge High Low 
Beamline XRD* BW7A> 
Data set® Native Pl P2 P3 El E2 HI H2 Ll 
Wavelength (Ay! 0.99983 0.97487 0.97845 0.97848 0.97864 0.97874 0.95583 0.9080 1.0022 
Resolution (A) (highest resolution bin) 50-1.95 (1.98-1.95) 30-2.8 30-2.8 30-2.8 30-2.8 30-2.8 30-2.8 30-2.8  30-2.8 
Completeness (%)° 98.9 (97.0) 99.9 98.1 99.7 99.9 99.7 99.7 98.8 97.3 
Mosaicity (°) 0.62 0.4 0.6 0.7 0.4 0.6 0.4 0.6 0.4 
Rmnerge (%0)°* 4.2 (22.1) 10.5 11.4 10.6 8.1 8.2 8.6 7.2 8.0 
Ryim (%)°% 4.6 (27.1) 12.1 13.0 12.3 9.2 8.9 10.2 75 10.3 
Roim (%)* 1.8 (15.2) 6.1 6.6 6.4 47 4.5 5.2 3.2 5.4 
Redundancy* 5.4 (2.9) 3.8 3.8 3.9 3.8 3.9 3.7 3.6 2.9 
T/o(D* 13.5 (4.0) 5.4 47 4.8 6.1 4.1 4.1 4.9 2.5 


@X-ray diffraction beamline at ELETTRA, Trieste, equipped with a Mar CCD detector. 
bWiggler beamline of EMBL at DESY, Hamburg, equipped with a Mar CCD detector. 


‘Highest resolution bin in parentheses. 


‘The inflection point and peak wavelengths were collected in inverse beam mode, whereas the remote wavelengths were collected at the low energy 
side of the Se edge where there is little anomalous signal and, as a result, no inverse beam data were collected. 
°P1, P2, P3 = peak wavelengths 1, 2 and 3; El, E2 = edge wavelengths | and 2 (point of inflection); H1, H2 = high energy remote wavelengths | and 2; 


LI = low energy remote wavelength. 


f 
Rrerge 


= 100 X YXpgll; — </>l/L;XxWi, where J; is the observed intensity and </> is the average intensity from multiple measurements. 


8Rim = 100 X Lj (W/N — 1)!?X yal, — <I>V/2ZZpalj, where N is the number of times a given reflection has been measured. This quality indicator 
corresponds to an Rsym that is independent of the redundancy of the measurements. 


h = 
Roim = 


Materials and methods 


Protein purification and crystallization 

Recombinant TGEV MP'® was expressed and purified as previously 
described for the HCoV and FIPV main proteinases (Ziebuhr et al., 1997; 
Hegyi et al., 2002). Briefly, the coding sequence of the TGEV MP’? was 
inserted into the XmnI and BamHI sites of pMal-c2 plasmid DNA (New 
England Biolabs). The resulting plasmid, pMal-MP'®, was used to 
transform Escherichia coli TB1 cells. The maltose-binding protein 
(MBP)-TGEV MPF'° fusion protein was purified by amylose—agarose 
chromatography, cleaved with factor Xa, and the recombinant MP’ 
(residues Serl—G1n302) was purified by hydrophobic interaction, anion 
exchange and size exclusion chromatography (Hegyi et al., 2002). The 
purified and concentrated TGEV MP’? (12.5 mg/ml) was stored in 12 mM 
Tris-HCl pH 7.5, 120 mM NaCl, | mM dithiothreitol (DTT), 0.1 mM 
EDTA. This protein solution was used to crystallize MP"° by the hanging 
drop vapour diffusion method at 4°C. The best crystals, which were of 
triangular shape and had dimensions of ~0.3 X 0.25 X 0.3 mm, were 
obtained by using 100 mM HEPES pH 8.8, 1.8 M ammonium sulfate, 6% 
MPD, 5 mM DTT and 4% dioxane as the reservoir and grew in ~10 days. 


Incorporation of selenomethionine 

The MP’? structure could not be solved using conventional molecular 
replacement techniques. Therefore, selenomethionine (SeMet)-substi- 
tuted TGEV MP” was produced. The coding sequence of the MBP-TGEV 
Me fusion protein was inserted into pET-11d (Novagen), and the 
resulting plasmid, pET-TGEV-MP", was used to transform the 
methionine-auxotrophic 834(DE3) E.coli strain (Novagen), which was 
propagated in minimal medium containing 40 [g/ml seleno-L- 
methionine. The SeMet-substituted TGEV MP" was purified as described 
above and concentrated to 9.5 mg/ml. Crystals of the SeMet-substituted 
MP’o were grown as decribed for the native protein but using 2 M 
ammonium sulfate and 8% MPD. 


Diffraction data collection 

Crystals used for data collection were rinsed with mustard oil and cryo- 
cooled in liquid nitrogen. Diffraction data up to 1.95 A resolution were 
collected from native crystals at 100 K on the X-ray diffraction beamline 
at ELETTRA (Sincrotrone Trieste, Trieste, Italy), using a Mar165 CCD 
detector (Table II). MAD data sets were collected to 2.8 A resolution at 
four wavelengths using a Mar165 CCD detector on beamline BW7A of 
the EMBL Outstation at DESY (Hamburg, Germany). SeMet data sets 
were collected for the f' maximum and f minimum wavelengths. 
Additional data were collected at remote wavelengths below and above 
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100 X © C/N — 1)!?2 yall, — <I>V/2,2y4/;. This factor provides information about the average precision of the data. 


the Se K-edge (Table II). Data integration and scaling were performed 
using DENZO and SCALEPACK (Otwinowski and Minor, 1997). 


Structure determination 

The unit cell dimensions, as well as the self-rotation function (ALMN; 
CCP4, 1994), implied that several monomers were present in the 
asymmetric unit. A Matthews coefficient (Matthews, 1968) of 2.3 A?/Da 
and a solvent content of 51% were obtained assuming six molecules in the 
asymmetric unit. The bottleneck of the structure determination was the 
identification of the 60 selenium positions (six monomers with 10 Se 
each). Solving the problem by SnB v2.0 (Weeks and Miller, 1999) 
required data of increased precision, which were obtained by averaging of 
several data sets and monitoring the process by Rpim (Weiss and 
Hilgenfeld, 1997). Only after we had combined three merged peak- 
wavelength data sets with two merged edge-wavelength data sets 
(redundancy = 18) were we able to obtain 105 solutions (from 5000 
trials) with significantly reduced minimal function values (Rypin = 0.49, 
CC = 0.51; Hauptman, 1991) (details to be published elsewhere). The 
positions of the best 60 atom solutions from SnB were examined for NCS. 
In total, 37 positions were found to obey a 2-fold NCS. This symmetry 
predicted a further 11 positions. All 48 positions were used in MLPHARE 
(CCP4, 1994) for phasing, followed by solvent flattening and NCS 
averaging in DM (Cowtan and Main, 1996). The resulting electron 
density maps were of sufficient quality for chain tracing. The first 
monomer was built manually into the experimental electron density map, 
using the program ‘O’ (Jones et al., 1991). All other monomers were 
generated by NCS. NCS restraints were applied during the initial stages of 
refinement at low resolution and later gradually released as the resolution 
limit was extended to 1.96 A. 

Cycles of adjustments to the model with O and subsequent refinement 
using the program CNS (Briinger et al., 1998) converged to an Rfee of 
0.256 and a crystallographic R-factor of 0.210. Data quality and 
refinement statistics are given in Table III. The quality of the structural 
model and its agreement with the structure factors were checked with 
programs PROCHECK (Laskowski et al., 1993), WHATCHECK 
(Vriend, 1990) and SFCHECK (Vaguine et al., 1999). Solvent 
accessibility was calculated using the algorithm of Lee and Richards 
(1971; program NACCESS), using a solvent probe of radius 1.4 A. The 
molecular diagrams were drawn using MOLSCRIPT (Kraulis, 1991) and 
rendered with RASTER 3D (Bacon and Anderson, 1988). Atomic 
coordinates and structure factors have been submitted to the RCSB 
Protein Data Bank under accession code 1LVO. 


Table III. Phasing statistics, refinement statistics and model quality 


Phasing 
FOM? before solvent flattening 0.48 
FOMé#? after solvent flattening (no averaging) 0.72 
FOM? after solvent flattening (with averaging) 0.79 
Refinement 
Resolution (A) 50-1.96 
R-factor> 0.210 
Riree ; 0.256 
No. of non-hydrogen atoms [average B-value (A7)] 
Protein (main chain) 7198 (46.1) 
Protein (side chain) 6613 (47.2) 
Water 1006 (50.3) 
MPD 48 (67.6) 
Sulfate 135 (57.1) 
Dioxane 54 (71.7) 
R.m.s. deviation from ideal geometry 
Bonds (A) 0.017 
Angles (°) 1.9 
Improper dihedral angles (°) 1.16 


®FOM = figure of merit. 
>R-factor = & (IF ol — AIF .I)/ELF |, where k is the scale factor. 


Proteolytic activities of TGEV MP mutants 

For the expression of M?"° proteins with N- and C-terminal deletions 
(MP?°A1 84-302, MPr°A200—302, MPt°A1—5 and MP'°A1—5/A200-302), the 
corresponding MP'® coding sequences were amplified by PCR and 
inserted into XmnI—BamHI-digested pMal-c2 plasmid DNA. To substitute 
the MP residues Cysl44 (by Ala) and His163 (by Leu), the 
corresponding codons were replaced in pMal-MP'° by site-directed 
mutagenesis using a recombination-PCR method (Yao et al., 1992). The 
details of the primers used for cloning and mutagenesis and the amino 
acid sequences of the recombinant proteins expressed and tested for 
proteolytic activity are given in Table I. The plasmid DNAs were 
transformed into E.coli TB1 cells and the recombinant proteins were 
synthesized, affinity purified and cleaved with factor Xa as described 
previously (Hegyi et al., 2002). The purity and structural integrity of the 
mutant proteins were analysed by SDS-PAGE. The control protein for 
this experiment, wild-type TGEV Mpro?", was purified in an identical 
manner. Enzymatic activities of the mutant proteins were measured by 
using a peptide cleavage assay (Ziebuhr et al., 1997) with a peptide 
substrate representing the N-terminal TGEV MP'° autoprocessing site 
(H2N-VSVNSTLQSGLRKMA-COOH,; letters in bold indicate the 
scissile bond that is cleaved by MP"°). 
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